A Novel Mis-Seg-Focus Loss Function Based on a Two-Stage nnU-Net Framework for Accurate Brain Tissue Segmentation

Brain tissue segmentation plays a critical role in the diagnosis, treatment, and study of brain diseases. Accurately identifying these boundaries is essential for improving segmentation accuracy. However, distinguishing boundaries between different brain tissues can be challenging, as they often overlap. Existing deep learning methods primarily calculate the overall segmentation results without adequately addressing local regions, leading to error propagation and mis-segmentation along boundaries. In this study, we propose a novel mis-segmentation-focused loss function based on a two-stage nnU-Net framework. Our approach aims to enhance the model’s ability to handle ambiguous boundaries and overlapping anatomical structures, thereby achieving more accurate brain tissue segmentation results. Specifically, the first stage targets the identification of mis-segmentation regions using a global loss function, while the second stage involves defining a mis-segmentation loss function to adaptively adjust the model, thus improving its capability to handle ambiguous boundaries and overlapping anatomical structures. Experimental evaluations on two datasets demonstrate that our proposed method outperforms existing approaches both quantitatively and qualitatively.


Introduction
Brain tissue segmentation is a pivotal medical image processing technique employed to delineate and distinguish various tissues within the brain, including gray matter, white matter, and cerebrospinal fluid [1].Its significance lies in its contribution to the diagnosis, treatment, and investigation of brain disorders, facilitating more precise identification of brain lesions and the formulation of effective treatment strategies [2].Thus, precise brain tissue segmentation holds paramount importance in both medical research and clinical practice, as it not only deepens our comprehension of brain disorders but also fosters the refinement of diagnostic approaches and therapeutic interventions.
Current deep learning methods in medical image segmentation primarily use loss functions to measure the disparity between network predictions and Ground-Truth labels, guiding parameter updates [3].However, these loss functions typically concentrate on segmentation in the whole image, calculating loss in a global region, which is known as global loss.The application of global loss often leads to the neglect of certain local regions in the image, including boundaries or mis-segmentation regions.Existing deep learning methods prioritize global region feature extraction and overall segmentation optimization, neglecting local region information at boundaries in magnetic resonance imaging (MRI), causing segmentation results to appear blurry or discontinuous.The complexity of segmentation tasks in brain tissue segmentation is due to the adjacency of different tissues and varying intensity characteristics, posing challenges for traditional loss functions.Consequently, segmentation errors such as mis-segmentation, omission, or over-segmentation at boundaries compromise overall segmentation accuracy.In short, despite significant progress, their focus is on the global region, neglecting local regions where segmentation errors often occur.
The segmentation of brain tissue relies heavily on local regions due to overlapping boundaries between different brain tissues.The local region is crucial for accurate brain tissue segmentation tasks.Gray matter, white matter, and cerebrospinal fluid often exhibit overlapping intensities and textures in MRI images, complicating the distinction between them.The boundaries between gray and white matter are typically indistinct, as they share similar intensity and texture characteristics.Currently, some methods calculate loss based on local regions (known as local loss), often focusing on boundary regions for loss calculation.Although boundary regions are frequently prone to errors in brain tissue segmentation, the regions where mis-segmentations occur are not confined to these edges alone.Therefore, it is essential to conduct research on loss calculation methods specifically targeting mis-segmentation.
The aim of this study is to address the limitations of existing deep learning methods in brain tissue segmentation, particularly regarding the oversight of crucial information at image boundaries.Recognizing the significance of accurate boundary delineation, especially in MRI images for diagnosing brain tissue abnormalities, our research endeavors to design a novel loss function that effectively leverages both global and local regional information based on an improved nnU-Net baseline architecture.The two-stage training strategy serves to address segmentation errors effectively.In the first stage, the network is optimized by calculating the loss of the global region, while the second stage involves defining a mis-segmentation loss function to adaptively adjust the model, thus improving its capability to handle ambiguous boundaries and enhancing segmentation accuracy.
The main contributions are as follows: • We propose a novel mis-seg-focus (MSF) loss function that guides the network to focus on mis-segmented regions, improving the segmentation accuracy of mis-segmented regions.This allows for more precise segmentation of brain tissues.• We propose a two-stage training strategy.In the first stage, the network undergoes training using global loss to grasp overall structural information.In the second stage, we use the MSF loss, enabling the network to train based on both global and local information.The global loss assists the network in capturing global information, while the MSF loss focuses on refining the mis-segmentation regions.• We validate the efficiency of the proposed method on different datasets, including a dHCP dataset about infants and an OASIS dataset about the elderly population.

Related Work
Traditional brain tissue segmentation methods mostly rely on image processing techniques and machine learning algorithms, such as thresholding, region growth, and graph cuts.These methods achieve segmentation through manually designed features and rules.Tools like FSL and FreeSurfer [4,5], for instance, employ such methods to process adult brain tissue images.Additionally, there are tools specifically designed for pediatric brain tissue segmentation, such as iBEAT [6].However, the effectiveness of these traditional methods is often influenced by factors like image quality and noise, and they exhibit certain limitations when dealing with complex scenarios involving blurry boundaries and overlapping anatomical structures.
With the rapid development of deep learning, significant progress has been made in brain tissue segmentation methods based on deep learning [1].These methods use deep learning models such as Convolutional Neural Networks (CNNs) to learn feature representations from large amounts of data, enabling them to automatically learn more dis-criminative features and achieve better performance in brain tissue segmentation tasks [7][8][9].In order to address the challenges encountered in brain tissue segmentation, researchers are continuously exploring new methods.Currently, the application of deep learning in brain tissue segmentation primarily relies on U-Net and its derivative networks [10][11][12][13], such as dual-encoder U-Net [14], dynamic U-Net [15], and U-Net integrated with fuzzy information [16].Additionally, there are brain tissue segmentation networks based on improved Transformer structures, including Swin UNETR [17], which can extract features at different resolutions, and Transformer structures with varying weight configurations [18].Fabian Isensee and his colleagues have developed a self-configuring method called nnU-Net based on U-Net [19].On brain tissue segmentation datasets, the excellent architecture of nnU-Net enables it to achieve outstanding performance, demonstrating its strong adaptability and efficiency [20].Despite the remarkable achievements of deep learning in the field of brain tissue segmentation, it still faces some challenges.Current deep learning methods tend to overlook mis-segmentation regions when processing segmentation results, which leads to suboptimal performance in handling complex scenarios such as blurry boundaries and overlapping anatomical structures.
In current brain tissue segmentation tasks, adjustments to the loss function have been shown to effectively improve the segmentation performance of the network.For instance, Zhang, X. and his team have introduced contrastive learning loss, using feature maps as key inputs for contrastive learning [21].Meanwhile, Wan, X. and others use preset total loss functions to update the parameters of the student model [22].Additionally, calculating loss based on edge regions is also a commonly used approach [23,24].However, these methods still need improvement in guiding the network to more precisely focus on mis-segmentation regions at the edges of multi-label brain tissue.Two-stage training methods have demonstrated their strengths in image segmentation, particularly in refining edge regions and enhancing segmentation accuracy.Typically, these methods involve a first stage of rough segmentation followed by a second stage of fine segmentation, making them suitable for challenging tasks such as vessel segmentation [25].Additionally, nnU-Net incorporates a two-stage cascaded U-Net architecture that can further refine the coarse segmentation results obtained in the first stage.However, the current two-stage segmentation methods still need improvement in their ability to precisely segment mis-segmentation regions.

Overall Architecture
The architecture of the proposed framework is shown in Figure 1.The proposed model is based on nnU-Net.We improve the model structure by employing deeper filters to enhance feature extraction capabilities.We have proposed a novel mis-seg-focus (MSF) loss function, implemented by using a two-stage training strategy.In the first stage, we train the model using a global loss to locate mis-segmentated regions.In the second stage, we employ the mis-segmentation region extraction block (MS block).This block, at the end of each training epoch, calculates the intersection over Union between the network predictions and the corresponding Ground-Truth labels, thereby precisely extracting mis-segmentation regions for each label.Subsequently, we perform dilation on these regions and merge the mis-segmentation regions from different brain tissues.Finally, we compute the loss on these processed mis-segmentation regions, incorporating both global loss and our proposed MSF loss for the second stage of training.

Improved Network Architecture
The nnU-Net framework is an improved version based on the U-Net architecture, featuring a U-shaped design consisting of an encoder and a decoder.We have made further adjustments to the network architecture based on the original nnU-Net framework, deepening the overall network structure and increasing the number of filters in the encoder.In addition, we have adopted an asymmetric network design to be er adapt to multi-label datasets, thereby enhancing the network's capacity and expressive ability.The network incorporates a five-round downsampling dual convolution process.Notably, during the first three rounds of downsampling, the depth of the filter layers gradually doubles, significantly increasing the total number of network parameters.Furthermore, skip connections are utilized to fuse feature information from different network levels, improving feature extraction efficiency.Simultaneously, the deep supervision module collects feature maps of different sizes generated during the downsampling process and integrates this information into the loss function calculation.

Mis-Segmentation Region Extraction
The extraction of mis-segmentation regions in the MSF loss is accomplished by the MS region extraction module.There are some specific regions formed by mis-segmentation between the network result and the Ground Truth.In order to more accurately identify and extract these regions, we utilize the rich pixel information contained in the last three stages of the upsampling process.We extract feature maps from these stages and apply the sigmoid function to compress the values between 0 and 1, making it easier to

Improved Network Architecture
The nnU-Net framework is an improved version based on the U-Net architecture, featuring a U-shaped design consisting of an encoder and a decoder.We have made further adjustments to the network architecture based on the original nnU-Net framework, deepening the overall network structure and increasing the number of filters in the encoder.In addition, we have adopted an asymmetric network design to better adapt to multi-label datasets, thereby enhancing the network's capacity and expressive ability.The network incorporates a five-round downsampling dual convolution process.Notably, during the first three rounds of downsampling, the depth of the filter layers gradually doubles, significantly increasing the total number of network parameters.Furthermore, skip connections are utilized to fuse feature information from different network levels, improving feature extraction efficiency.Simultaneously, the deep supervision module collects feature maps of different sizes generated during the downsampling process and integrates this information into the loss function calculation.

Mis-Segmentation Region Extraction
The extraction of mis-segmentation regions in the MSF loss is accomplished by the MS region extraction module.There are some specific regions formed by mis-segmentation between the network result and the Ground Truth.In order to more accurately identify and extract these regions, we utilize the rich pixel information contained in the last three stages of the upsampling process.We extract feature maps from these stages and apply the sigmoid function to compress the values between 0 and 1, making it easier to represent probabilities or perform binarization.Finally, the processed feature maps are output to the MS region extraction module to locate and extract the mis-segmentation regions in the MSF loss.
To facilitate region calculation and the extraction of mis-segmentation, the feature layer is first processed using argmax to convert it into three-dimensional data y ′ n .As shown in the global block diagram in Figure 1, the y n data are separated by labels, where N represents the number of labels, resulting in N label output results from y ′ 1 to y ′ N .The same operation is performed on the Ground Truth (GT) to obtain GT y 1 to y N for N labels.Label extraction is performed separately for the N labels, yielding the mis-segmented region M n for the nth label, as shown in Equation (1).
Due to the need for partial global regions during region calculation, the mis-segmentation regions of each label are dilated to obtain M dn .The mis-segmentation regions of all labels are superimposed to obtain the global mis-segmentation region, as shown in Equation ( 2). (2)

Mis-Seg-Focus (MSF) Loss Function
Mis-seg-focus (MSF) loss is a method of calculating loss specifically targeted at missegmentation regions.By utilizing this approach, the network can more effectively concentrate on those areas that have been incorrectly segmented, thereby enhancing the accuracy of segmentation.
Set A as the set of all pixel points in the whole MRI image, and extract M d as the set of all discrete segmentation error regions from the MS region, which is a subset of A, M d ⊂ A. Therefore, the network output and GT for the mis-segmentation region are ŷ′ and ŷ′ , as shown in Equations ( 3) and (4).The y ′ and y regions in the formula belong to the global region set A, while the ŷ′ and ŷ regions belong to the mis-segmentation region set The segmentation loss function uses a compound loss consisting of Dice loss and Cross Entropy loss.The segmentation loss function is shown in Equation (5).The global region loss calculation is shown in Equation ( 6), and the MSF loss calculation is shown in Equation (7).
As shown in Equation ( 5), y 1 and y 2 in the loss function L loss represent the Ground Truth and the network's prediction results in the region, respectively, where weights ω are used to balance the three loss functions.The Dice loss function is shown in Equation (8), and the CE loss function is shown in Equation (9).
where a j in Equation ( 5) represents the weight of the deep supervision module, and the specific formula is shown in Equation (10), where dn represents the total number of layers in the deep supervision module, and j = 1 represents the first layer in the decoding stage.The feature maps in the decoding stage are all output through Sigmoid and added to the loss function, and their weights increase as the size of the upsampling feature map increases, allowing gradient information to be injected deeper into the network. (10)

Two-Stage Training Strategy
By implementing a two-stage training strategy, we can effectively handle difficult regions in segmentation.
In the 3D cascade network in nnU-Net, although the first U-Net operates on downsampling images and the second U-Net can be trained to refine the segmentation map created at full resolution by the previous one, it still only performs segmentation on the global target of the previous stage, without significant improvement for difficult segmentation of tissue edges.To address this issue, we adopted a two-stage segmentation network joint training strategy.
In the first stage, the network undergoes global training to initially determine the segmentation results, using a global loss to perform coarse segmentation on images, as shown in Equation (11).When evaluation metrics such as Dice and HD95 cannot be improved, the training set has been fully fitted.At this point, there are some difficult-todistinguish regions in the mis-segmentation and global areas.Through matrix calculation of the network output results and Ground Truth, the mis-segmentation regions are formed.In this study, the first-stage training epoch is 300.
In the second stage, an MSF loss function is used in the network to constrain and correct the erroneous segmentation regions that appeared in the first stage.A novel MSF loss is proposed, as shown in Equation (12).The proportion of global loss and MSF loss is weighted through the weight λ.This staged approach enables the network to more accurately handle complex segmentation tasks, especially in difficult regions of tissue boundaries.

Datasets
Two sets of data were used to evaluate the feasibility of the proposed method.One is from the Developing Human Connectome Project (dHCP), and the other is from the Open Access Series of Imaging Studies (OASIS).Informed consent was obtained from the imaging subjects in compliance with the Institutional Review Board.
The dHCP dataset is an international collaborative project aimed at studying infant brain development.All infants were born at term (37-44 weeks of gestation) and underwent imaging.It included T1 modal data from 40 neonatal subjects, axial slice stacks with an in-plane resolution of 0.8 × 0.8 mm 2 , and 1.6 mm slices overlapping by 0.8 mm.The Ground Truth contains nine labels, including gray matter, white matter, cerebrospinal fluid, background, ventricles, cerebellum, deep gray matter, brainstem, and hippocampus.The dataset is split into training and testing sets in a 4:1 ratio, and all training data underwent 5-fold cross-validation.
The OASIS dataset is a collection of MRI datasets used to study brain structure and function in the adult population.This group comprises 416 subjects aged between 18 and 96 years.Each subject has three or four separate T1-weighted MRI scans obtained during a single scanning session.All the subjects are right-handed and include both males and females.The image dimensions of the OASIS dataset are 208 × 176 × 176 (height × width × slices), where each axial scan encompasses 176 slices.The Ground Truth contains four labels, including gray matter, white matter, cerebrospinal fluid, andcortex.The dataset is split into training and testing sets in a 4:1 ratio, and all training data underwent 5-fold cross-validation.

Evaluation Metrics
We adopted the Dice coefficient (Dice) and the 95th percentile of the Hausdorff distance (HD95) as the primary evaluation metrics to comprehensively evaluate the performance of the proposed methods.
The Dice coefficient, as an effective indicator for measuring the similarity between two samples, is used to quantify the degree of overlap between the model prediction result and the true label, thereby accurately reflecting the segmentation accuracy of the model.This indicator pays particular attention to the consistency of the overall pixel distribution, that is, the degree of agreement between the model prediction result and the actual brain tissue structure.
The HD95 focuses on measuring the fineness of the model in boundary processing.It reflects the model's ability to grasp boundary details by calculating the maximum mismatch between the model segmentation result and the true value, that is, the furthest distance between two sets of points.Since Hausdorff distance is extremely sensitive to boundary anomalies in segmentation results, HD95 can effectively reveal minor deviations in the boundary processing of the model.

Implementation Details
Our improved model is implemented in PyTorch 2.0, a GPU-based neural network framework, and optimized based on the nnU-Net integration framework.Code testing and network training were performed on Ubuntu 20.04 systems and accelerated using NVIDIA RTX3090Ti (with 24GB of video memory) GPUs.The following are the hyperparameter settings for the experiment.
The network training framework uses the 3D U-Net network framework in nnU-Net, and the optimizer uses Adam with a Leaky ReLU activation function.The initial learning rate is 3 × 10 −4 , and there are 250 batches per epoch with a batch size of two.The learning rate training strategy is to calculate the exponential moving average loss of the training set and the validation set.If the exponential moving average loss of the training set does not decrease by 5 × 10 −3 within the last 30 epochs, the learning rate is attenuated by a factor of five.

Comparison Experiment 4.4.1. Experiment Settings
We employed several state-of-the-art deep learning methods (including 3D U-Net, DynUNet, and Swin UNETR) that are based on the U-Net structure, all of which feature 3D network architectures.Additionally, we utilized two publicly available datasets: the dHCP dataset, containing nine labels (including gray matter, white matter, cerebrospinal fluid, background, ventricles, cerebellum, deep gray matter, brainstem, and hippocampus), and the OASIS dataset, containing four labels (including gray matter, white matter, cerebrospinal fluid, and cortex).To ensure the fairness of the experiments, all methods used the same compound loss function for network training, with 600 training epochs.Due to the difficulty of ensuring fairness in data augmentation and network parameters among various algorithms, we tried to use the original source code in the comparative experiments and adopted similar parameter settings.

Segmentation Results on dHCP
As shown in Table 1, the performance of the original nnU-Net parameter framework is compared with the enhanced nnU-Net network parameter framework (referred to as DP, shorthand for "deep network").The experimental results indicate that across various tissues, the deep network exhibits significant improvements in both the Dice coefficient and HD95 metrics.Furthermore, when MSF loss (MSF) is incorporated alongside DL, the experimental outcomes are further enhanced.Compared to the original nnU-Net, our proposed method achieves an improvement of 0.4 percentage points in the mean Dice metric and a reduction of 1.3 in the mean HD95 metric, with notable improvements across all tissues.Especially for small labeled tissues, like the hippocampus and ventricles, the segmentation performance is optimal.When compared to other state-of-the-art deep learning approaches and commonly used traditional methods (BET2), our method outperforms them by 5.8-8.8 percentage points in the mean Dice metric and achieves a reduction of 5.3-9.6 in the mean HD95.These findings convincingly demonstrate the superiority of our approach to brain tissue segmentation tasks.As shown in the visualization results in Figure 2, we performed visualization on the coronal, sagittal, and axial planes and presented the multi-label segmentation results and mis-segmentation results (red areas) of the network.Due to nnU-Net's excellent postprocessing module, the edges of its segmentation results are smoother compared to other segmentation methods.The segmentation results obtained by our proposed method are very close to the Ground-Truth labels, with good edge segmentation effects, especially in the multi-label intersection areas.By comparing the red mis-segmentation areas, it can be clearly seen that our method produces the fewest mis-segmentation areas compared to other comparison methods.

Segmentation Results on OASIS
To further validate the generalization capability of our two-stage nnU-Net training method based on MSF loss, we tested our model on the OASIS dataset.The OASIS brain tissue segmentation has a total of four segmentation labels, namely gray matter, white matter, cerebrospinal fluid, and cortex, with the last column representing the average of the four labels.As shown in Table 2, our method achieved the best segmentation performance in terms of average Dice and average HD95 compared to other segmentation networks.Compared to segmentation networks such as DynUNet and Swin UNTER, the Dice improved by approximately 2 to 3 percentage points, while the HD95 varied between approximately 0.2 and 3.63.Compared to the original nnU-Net framework, our method improved Dice and HD95 by 0.61 and 0.5032, respectively, and there was an improvement in the evaluation metrics for each label.As shown in the visualization results in Figure 3, we have performed visualization on the coronal, sagittal, and axial planes and presented the multi-label segmentation results and mis-segmentation results (red areas) of the network.These results demonstrate the effectiveness of our proposed method on different multi-label brain tissue segmentation tasks, improving the network's edge segmentation capability and enhancing the accuracy of brain tissue segmentation.

. Design of MSF Loss Function
To verify the effectiveness of the MSF loss and determine the optimal hyperparameter settings for mis-segmentation regions, we conducted a series of ablation experiments on MSF region extraction.Under the same training strategy and network hyperparameters, we trained multiple network models.As shown in Table 3, by comparing the results of nnU-Net and the best results, we confirmed that the MSF loss can improve the performance of the model, as reflected in a 0.31 increase in the Dice index and a 0.36 increase in the HD95 index.In the process of exploring the hyperparameters for mis-segmentation regions, we tried three different settings for the expansion coefficient: an expansion coefficient of 2, an expansion coefficient of 5, and an expansion coefficient of 10.After detailed testing, we found that the setting of the expansion coefficient of 2 weighted performed the best on the evaluation metrics, especially on the HD95 index, achieving good parameter optimization.We conducted a series of ablation experiments covering Dice loss, CE loss, and joint loss functions in order to determine the optimal brain tissue segmentation loss.We maintained the same nnU-Net network architecture, removed the influence of network architecture and two-stage training strategies, and relied solely on different loss functions for network training.For each experiment, we applied different loss functions to calculate the Dice evaluation metrics in Table 4 and visually displayed the comparison results in Figure 4.The joint loss Dice and CE performed best in terms of Dice scores across all tissues.We experimented with four different values for the weight λ of the mis-segmentation loss function, including 0.5, 1, 10, and 100.As shown in Figure 4, the best results were achieved when the weight was set to 1.To verify the effectiveness of the two-stage training strategy, we conducted tests under the same network parameters and hyperparameters for MSF loss, with both the first and second stages trained for 300 epochs.As shown in Table 5, by comparing Experiment 1 and Experiment 2, we observed that the network had already achieved a state of training fit, indicating that the evaluation metrics could not be further improved.Next, we tested the scenarios of using only global loss and only MSF loss in the second stage separately, and the results showed that the segmentation performance in both cases was inferior to using both global and MSF losses.In Experiment 4, although both stages employed a combination of global and MSF losses for training, the large mis-segmentation regions in the first stage led to a high variation rate in loss values, which affected the final segmentation results.In summary, it can be concluded that using global loss in the first stage and adopting MSF loss functions in the second stage can achieve the best segmentation results.As shown in Figure 5, the first column in the figure shows the mis-segmentation regions of the whole brain (represented in red), while the second to fourth columns present the segmentation results for three tissues: the hippocampus, ventricle, and white matter, with yellow indicating correctly segmented regions and red indicating mis-segmentation regions.In the first stage, we adopted the segmentation results from the 100th epoch and the last epoch, and it can be clearly seen that there are still many mis-segmentation regions represented in red in the nnU-Net training results, which constitute the MSF loss regions.In the second stage, we applied both global and MSF losses, and it can be observed that as the epochs increase, the originally difficult-to-converge mis-segmentation regions gradually decrease.This observation strongly validates the effectiveness of combining global and MSF losses, as well as the two-stage training strategy.

Ablation Study on Network Architecture
To verify the feasibility of the improved network framework, we conducted ablation experiments.In this part of the experiment, we adopted the same training strategy and loss function.As shown in Table 6, our proposed method achieved be er segmentation results compared to the original network framework, specifically with a 0.9 increase in the Dice coefficient, and corresponding improvements were observed in every tissue.

Ablation Study on Network Architecture
To verify the feasibility of the improved network framework, we conducted ablation experiments.In this part of the experiment, we adopted the same training strategy and loss function.As shown in Table 6, our proposed method achieved better segmentation results compared to the original network framework, specifically with a 0.9 increase in the Dice coefficient, and corresponding improvements were observed in every tissue.

Discussion
The dHCP dataset is specifically designed for labeled data collection for infant brain tissues, while the OASIS dataset provides labeled data for adult brain tissues.Through comparative experimental validation, our proposed method has achieved optimal performance on both of these publicly available datasets, fully demonstrating its wide applicability.Since the adult brain has fully developed, with high contrast between tissues, our method performs well in segmenting adult brain tissues.However, segmenting infant brain tissues poses more challenges due to their ongoing development and relatively small differences among various tissues, especially when it comes to smaller tissue segments.Nonetheless, our method is still capable of effectively segmenting infant brain tissues.
The MSF loss function focuses the network's attention on mis-segmentation regions by calculating losses specifically for those areas.We have chosen Dice and Cross Entropy losses to calculate the losses for both mis-segmentation and global regions.Combined with a two-stage training strategy, the first stage involves training with global loss, enabling the network to learn global information until the evaluation metrics can no longer be improved.In the second stage, both global loss and MSF loss are used for training, allowing the network to acquire both global and local information simultaneously.Here, "local information" refers specifically to the mis-segmentation regions.After the first stage, there may be some mis-segmentation regions along tissue edges.Since this stage employs commonly used global loss functions, the network tends to focus on global information.In the second stage, the network is trained using both global and MSF losses, enabling it to concentrate on the mis-segmentation regions and, thereby, improve segmentation accuracy.
However, due to significant size differences among various brain tissues, especially for smaller structures like the hippocampus, which require high segmentation accuracy, we can adopt different mis-segmentation region extraction methods based on the size of the brain tissue in subsequent work.

Conclusions
Experimental results demonstrate that our proposed two-stage nnU-Net framework training method with MSF loss achieves optimal segmentation outcomes in brain tissue segmentation tasks.By adopting a deeper and wider network architecture, the model captures more intricate details and features in multi-label brain tissue segmentation.Compared to other advanced deep learning methods, including the original nnU-Net, our approach proves its effectiveness on publicly available datasets such as dHCP and OASIS, with improvements across various evaluation metrics.This indicates that our method excels at handling complex and variable brain tissue structures, enhancing segmentation accuracy.Notably, it stands out when dealing with intricate and diverse brain anatomies.Future work can explore optimizing the loss function and enhancing the generalization capabilities of the network model to tackle more challenging brain tissue segmentation tasks.

Figure 1 .
Figure 1.Overall architecture of the multi-label brain tissue segmentation method.It includes the enhanced nnU-Net main network framework (comprising nnU-Net network preprocessing, an enhanced nnU-Net network framework, and nnU-Net network post-processing), network framework information description, and MS region extraction section.

Figure 1 .
Figure 1.Overall architecture of the multi-label brain tissue segmentation method.It includes the enhanced nnU-Net main network framework (comprising nnU-Net network preprocessing, an enhanced nnU-Net network framework, and nnU-Net network post-processing), network framework information description, and MS region extraction section.

Table 1 .
Quantitative evaluation results of the segmentation on the dHCP dataset.The results of advanced deep learning methods and our proposed method with 5-fold cross-validation.Using Dice and HD95 as evaluation metrics, we show the segmentation results and mean results for nine brain tissues and present the best results in a deep representation.

Figure 2 .
Figure 2. Visualization of the segmentation results on the dHCP dataset.Advanced deep learning brain tissue segmentation methods were compared with the nnU-Net baseline network and our proposed method.Each plane demonstrates the segmentation results of different methods, with the mis-segmentation regions (shown in red) displayed beneath the segmentation outcomes.

Figure 2 .
Figure 2. Visualization of the segmentation results on the dHCP dataset.Advanced deep learning brain tissue segmentation methods were compared with the nnU-Net baseline network and our proposed method.Each plane demonstrates the segmentation results of different methods, with the mis-segmentation regions (shown in red) displayed beneath the segmentation outcomes.

Figure 3 .
Figure 3. Visualization of the segmentation results on the OASIS dataset.Advanced deep learning brain tissue segmentation methods were compared with the nnU-Net baseline network and our proposed method.Each plane demonstrates the segmentation results of different methods, with the mis-segmentation regions (shown in red) displayed beneath the segmentation outcomes.

4. 5 .
Ablation Study4.5.1.Design of MSF Loss FunctionTo verify the effectiveness of the MSF loss and determine the optimal hyperparameter se ings for mis-segmentation regions, we conducted a series of ablation experiments

Figure 3 .
Figure 3. Visualization of the segmentation results on the OASIS dataset.Advanced deep learning brain tissue segmentation methods were compared with the nnU-Net baseline network and our proposed method.Each plane demonstrates the segmentation results of different methods, with the mis-segmentation regions (shown in red) displayed beneath the segmentation outcomes.

Figure 4 .
Figure 4. Experimental results with different weights.The ordinate of the coordinate chart is Dice as an evaluation metric, which shows a total of nine kinds of brain tissue and average results.

Figure 5 .
Figure 5.Comparison of the mis-segmentation regions in different epochs of the two stages on the dHCP dataset.Three types of brain tissues (hippocampus, ventricle, and white ma er) and the whole brain are visualized.The last epoch is the final round result of this stage.The red region is the mis-segmentation region.

Figure 5 .
Figure 5.Comparison of the mis-segmentation regions in different epochs of the two stages on the dHCP dataset.Three types of brain tissues (hippocampus, ventricle, and white matter) and the whole brain are visualized.The last epoch is the final round result of this stage.The red region is the mis-segmentation region.

Table 2 . Quantitative evaluation results of the segmentation on the OASIS dataset.
The results of advanced deep learning methods and our proposed method with 5-fold cross-validation.Using Dice and HD95 as evaluation metrics, we show the segmentation results and mean results for four brain tissues.

Table 3 . Experiment on mis-segmentation region dilation factor on the dHCP dataset.
Dice and HD95 are used as evaluation metrics.dilation N: N represents the dilation factor.

Table 4 . Experiment on the selection of the loss function on the dHCP dataset.
Dice is used as an evaluation metric.

Table 5 . Results of the 5-fold cross-validation of the two-stage training ablation experiment on the dHCP dataset. Dice
is used as the evaluation metric.Global refers to the global loss, MSF refers to the MSF loss, and Total refers to the Global + MSF.

Table 6 . Results of the 5-fold cross-validation of the network architecture improvement abla- tion experiment on the dHCP dataset. Dice
is used as the evaluation metric.The best results are highlighted.